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Abstract 

This paper is concerned with a lesser-studied problem in the context of model- 
based, uncertainty quantification (UQ), that of optimization/design/control un¬ 
der uncertainty. The solution of such problems is hindered not only by the usual 
difficulties encountered in UQ tasks (e.g. the high computational cost of each 
forward simulation, the large number of random variables) but also by the need 
to solve a nonlinear optimization problem involving large numbers of design vari¬ 
ables and potentially constraints. We propose a framework that is suitable for 
a large class of such problems and is based on the idea of recasting them as 
probabilistic inference tasks. To that end, we propose a Variational Bayesian 
(VB) formulation and an iterative VB-Expectation-Maximization scheme that is 
also capable of identifying a low-dimensional set of directions in the design space, 
along which, the objective exhibits the largest sensitivity. We demonstrate the 
validity of the proposed approach in the context of two numerical examples in¬ 
volving (T(10 3 ) random and design variables. In all cases considered the cost 
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of the computations in terms of calls to the forward model was of the order 
0( 10 1 2 ). The accuracy of the approximations provided is assessed by appropriate 
information-theoretic metrics^ 

Keywords: Uncertainty Quantification, Variational Bayes, Optimization, 
Dimensionality reduction, Dictionary Learning 


1. Introduction-Motivation 

With the increased computational capabilities afforded by the utilization of 
peta- and exa-scale computing resources throughout engineering and the physical 
sciences, the issue of confidence in simulation results has come at the center of 
current research. The objective of obtaining a nominal computational represen¬ 
tation of a physical process is being replaced by the new paradigm of predictive 
simulations where the analysis delivers a quantification of uncertainty due to ran¬ 
domness in parameters, data or models. Decisions that are based on high-fidelity 
computational simulations due to their potential economic or societal impact 
cannot be accepted without quantitative information on the confidence in the 
computed result. 

The field of model-based, uncertainty quantification has seen marked ad¬ 
vances in recent years. Naturally, the majority of the efforts have been directed 
towards forward uncertainty propagation i.e. the computation of output statis¬ 
tics given input uncertainties. While several important challenges still remain 
unanswered, the ultimate objective of the analysis of physical processes and en¬ 
gineering systems is to enable their control and optimization with respect to 

1 This paper is based on the homonymous talk given during the international symposium 
on "Big Data and Predictive Computational Modeling" that took place in 18-21 May 2015 at 

TUM-IAS, Munich Germany. 
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design objectives. Problems of optimization in the presence of uncertainty have 
attracted much less attention. On one hand, this is because they encompass 
all the difficulties encountered in uncertainty propagation. First and foremost 
the complexity of the forward problem and the increased computational expense 
associated which each call to the forward solver. It is generally the number of 
such forward solves that determines the overall computational cost. Secondly, 
the high-dimensionality of the vector of random variables. Especially in cases 
where spatiotemporal discretizations of random processes and fields are neces¬ 
sary, one must frequently deal with thousands of random variables. Furthermore, 
in stochastic optimization problems, there is the additional need to solve a de¬ 
manding, nonlinear optimization problem which might itself involve thousands of 
design variables as well as equality/inequality constraints, 

Significant advances have been achieved in deterministic optimization and 
control of comp ex systems particularly with the development of adjoint-based 
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imques [i, 1, J|| as well as by making use of reduced-order modeling techniques 
5|. Nevertheless their direct application in the stochastic counterparts of 
these problems would be infeasible or impractical as the integration with respect 
to uncertainties poses an insurmountable task. 

While decision-making under uncertainty was pioneered in the 1950s [fjj|, 
applications to large-scale physical models are scarce due to the inherent com¬ 
putational difficulties. Advances in stochastic/robust control and optimization 
[7, 8, 9] or reliability-based design optimization JlO, 11] are generally applicable 
to small systems or rely on specific system structure. Techniques using surrogate 


models and response surfaces 


12] or generalized Polynomial Chaos expansions 


13| might fail to provide good approximations if the number of uncertainties 
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is large, irreducible or non-Gaussian. Furthermore, there is a difficulty in quan¬ 
tifying the error introduced due to the discrepancy between the surrogate and 
reference model. A critical problem in that respect is the ability to deal with noisy 
evaluations of the objective functions, its gradient and higher-order derivatives. 

The stochastic optimization framework advocated in the present paper is 
motivated by the following desiderata: 

• The ability to seamlessly utilize deterministic (legacy) simulators and de¬ 
terministic optimization components such as a first and second order para¬ 
metric derivatives of model outputs. 

• The ability to deal with high-dimensional vectors of random and design 
variables. 

• Least possible number of forward solutions 

• The ability to quantify the robustness of the identified optimum and provide 
information on the design features that exhibit the largest sensitivity. 

• The ability to utilize even highly-approximate, reduced-order models or 
surrogates in order to expedite the solution process. 

The objective functions considered in this paper can be written in a general 


form as: 



( 1 ) 


where 6 E denotes the vector of random variables with a probability density 
function pe(0) and z E U dz denotes the vector of design variables. The function 
U (0, z) depends on the output of the mathematical model and in turn, implicitly 
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depends on random and design variables. Each evaluation of U(0,z) implies a 
forward model solution which is assumed expensive as in most challenging appli¬ 
cations. Naturally the optimization problem can be augmented with constraints 
with regards to the design variables as it will be demonstrated in the stochastic 
topology optimization problem that will be considered in the last section. We 
adopt the term utility function (opposite of a loss function) for U(6, z ) and ex¬ 
pected utility for V(z) and, without loss of generality, pose the corresponding 
problem as one of maximization. 

The formulation above is quite general and can be readily adapted to cases 
of practical interest. For example if U ( 6 , z) = 1 ^(0, z ) is the indicator function 
of an event A of interest (e.g. failure, or exceedance of a response threshold) 
then maximizing V(z) in Equation (OD) is equivalent to the maximization of the 
probability associated with the event A (similarly one can minimize the probability 
of event A by employing the indicator function of the complementary even A c 
in place of U in Equation (OQ)). The case that would be of principal concern in 
this paper involves utility functions of the following form |^: 

U(0,z) =exp{ —i || Q l/2 {u target - u(0,z)) || 2 } (2) 

where u(0,z) E R n denotes an output vector of interest (i.e. displacements, 
velocities, temperature etc), u target E a target/desired response and Q a 
positive definite matrix of choice (in the current examples Q = tqI„). Maxi¬ 
mizing the corresponding expected utility implies finding z for which the response 

2 As it will become apparent in the subsequent derivations, the exponent in Equation (j2j) 
is used in order to simplify the presentation and several other options to the same effect are 
possible. 
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quantities of interest are, on average, as close (in the norm defined by Q ) to the 
target values u tar get ■ Similar objective functions have been employed by 13] to 
identify random composites with target effective/homogenized properties and in 
151 ] in the context of computational mechanics. In addition related stochastic 


design/control objectives have been proposed in 


16] and 


a. 


The obvious strategy for maximizing the expected utility in Equation ([I]) is 
stochastic approximations such as noisy gradient ascent which, in its simplest 
form, iterates as follows: 

Z [t+ 1) = * (t) + Vt9t ( 3 ) 


where g t is a noisy (unbiased) estimator of the gradient: 

dU(0,z®) 


V z V(z®) = I dU{9 dz Z } p 0 (O) cW 


( 4 ) 


and rit a sequence of learning rates that satisfy YlnLo Vt = + oo . Et=o Vt < +°° 
e convergence to a (local) maximum is assured under fairly weak 


anc 

ma 
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19] 


conditions 


20 


21] even when a single sample of 6 from pe(Q) is used in the 


context of a basic Monte Carlo estimate of g t , the convergence rate can be slow 


requiring an exuberant number of forward calls to evaluate U anc 


An alternative perspective to the problem was proposed in 


22 


/<*%■ 

1 where it was 


recast as a probabilistic inference task. In particular one defines an auxiliary 
probability density p aU x{0, z ), jointly on random and design variables, as follows: 
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Paux(0, z ) OC U(0,z)pg(0) 


(5) 


The marginal p aU x{ z ) oc f p aU x(0, z ) dd is clearly proportional to V(z). If for 
example one could sample from the joint density p aU x(Q, z )< the z—coordinates 
will be marginally distributed according to V(z) and populate regions where this 
attains its maximum value(s). 

The proposed reformulation allows for a uniform treatment of random 6 and 
design variables z. More importantly, being able to infer p aU x{ z ) (or 3 good 
approximation thereof) will not only lead to point estimates for the maxima of 
the expected utility V(z) (which coincide with the maxima of p aux ( z )) but also 
provide valuable information about the sensitivity of the latter with respect to 


z and therefore the robustness of the selected optimal c 
Monte Carlo strategies have been previously employed 


esign 


25 


231 ]. Sequential 


23| with significant 


success in identifying multiple local maxima as well as utilizing approximate, 
surrogate models to expedite the inference task. Nevertheless the computational 
cost can still be significant as they potentially require a few thousand forward 
calls. 

In this work we advocate an alternative probabilistic inference framework, 
namely Variational Bayes (VB) 27]- Such methods have risen into promi 


nence for probabilistic inference tasks in the machine learning community 


29 


28 


30]. They provide approximate inference results by solving an optimization 


problem over a family of appropriately selected probability densities with the ob- 


3 For the definition of p aU x to be valid, it suffices that U is non-negative. The formulation 
can also account for U that take negative values as long as it is bounded from below i.e. 
U(B, z) > Uq > — oo (U 0 < 0), in which case one can use U(6, z) — Uq in place of U(9 , z) 
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jective of minimizing the Kullback-Leibier divergence 


311] with the target density 


(in our case p aux )■ The success of such an approach hinges upon the selection 
of appropriate densities that have the capacity of providing good approximations 
while enabling efficient (and preferably) closed-form optimization with regards to 
their parameters. 

A pivotal role in Variational Bayesian (VB) strategies or any other inference 
method, is dimensionality reduction i.e. the identification of lower-dimensional 
features that provide the strongest signature to the random variables and as¬ 
sociated distributions. Discovering a sparse set of features has attracted great 


interest in many applications as in the representation of natural images 


32 


and a 


host of algorithms have been developed not only for finding such representations 
but also an appropriate dictionary for achieving this goal 33]- While all these 


tools are pertinent to the present problem they differ in a fundamental way. They 
are based on several data/observations/instantiations of the vector that we seek 
to represent. In our problem however we do not have such direct observations 
i.e. the data available pertains to the output of a model which is nonlinearly 
and implicitly dependent on the vector of latent variables. Furthermore we are 
primarily interested in approximating the distribution associated with this vector 
rather than the dimensionality reduction itself. More importantly, only dimen¬ 
sionality reductions that are informative about the optimization objectives should 
be sought. 

A premise validated in a series of papers on the so-called “sloppy” models 
is that in several cases there exists a limited number of parameter combinations 
to which the outputs are sensitive. The overwhelming majority of directions are 
sloppy i.e. they embody parameter correlations that have minor influence in the 
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response and correspond to removable c 

[ ri' - 

inverse problems it was found [35j, [30, 


egrees of freedom. In the context of 
that such features of the parameters 


can be associated with the eigenvectors of an appropriate Hessian or Fisher 
Information matrix corresponding to small eigenvalues. Along these lines and 
by using a fully probabilistic argumentation we develop a reciprocal probabilistic 
PC/4 0 scheme where eigenvectors of smallest variance are iteratively computed 
and are employed not only for solving the probabilistic inference problem but for 
identifying the most sensitive design parameter combinations for the stochastic 
optimization objective. 

The rest of the paper is organized as follows: The next section (Section [2D 
presents the essential ingredients of the VB framework advocated, the dimen¬ 
sionality reduction scheme proposed and an iterative, coordinate-ascent algorithm 
that enables the identification of all the unknowns. Section 3 demonstrates the 
performance and features of the proposed methodology in two problems from 
heat conduction and solid mechanics involving (T(10 3 ) random and design vari¬ 
ables. 


2. Methodology 

As discussed in the introduction we formulate the optimization-under-uncertainty 
problem as one of probabilistic inference. To that end our goal is two-fold. Firstly, 
to compute efficiently an accurate approximation of the marginal density on the 
design variables z which provides a representation of the expected utility V(z). 


4 We use the term reciprocal to distinguish from probabilistic PCA schemes [§8j where one 
is interested in identifying the directions with the largest variance. In contrast, in the current 
setting as it will be explained later on, we are interested in the directions with lowest variance 
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Secondly, to identify a lower-dimensional subspace with regards to the design 
variables 2 that provides an assessment of the solution's robustness by discover¬ 
ing the most sensitive directions i.e. the directions along which, variations in 2 
will cause the largest decrease in the expected utility V(z). Such directions have 


been proven useful in deterministic design tasks 


39]. Apart from their obvious 


utility, they can also facilitate the inference task discussed previously. More im¬ 
portantly perhaps we propose a unified framework where the identification of the 
aforementioned lower-dimensional subspace is performed simultaneously with the 
inference of the associated densities under the same Variational Bayesian objec¬ 
tive. This yields not only a highly efficient algorithm (in terms of the number 
of forward solves) but also a highly extendable framework as discussed in the 
conclusions. 

We discuss first the parametrization advocated, identify latent variables and 
model parameters (Section 12.ID and subsequently demonstrate how the asso¬ 
ciated inference and learning tasks can be simultaneously addressed in the VB 
framework (Section 1231) . We finally present validation metrics that quantitatively 
assess the quality of the approximations derived (Section [2.61) . 


2.1. Parametrization - Dimensionality Reduction 

Consider the auxiliary density p aU x{0,z) defined in Equation (EJ). This can 
be further extended by the introduction of an additional density p z (z) as follows: 

Paux{Q,z) = 1 Pz{z) ^ z= I u(G,z)p e (9) p z (z) d9 dz (6) 

where p z (z) is the analog of the regularization term in a deterministic optimiza¬ 
tion problem. In many ways Equation (!6i) is a restatement of Bayes’ rule with 
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respect to 0, 2 : 


p(0, z\data ) = 


p(data\9, z ) p(0, z) 


(7) 


p(data) 

where pe(0) p z (z) play the role of the prior, U(0,z) is the likelihood and 
Paux(9,z ) is the posterior. The connection is more apparent when one con¬ 
siders the utility function of interest in this work (Equation (J2J) ) in which case: 


Paux(,9 1 z\Utarget) 


— Ill Q 1 1 2 {utarg<it — u (0 ,Z 


e 2 


2 pe(0) Pz 


( 8 ) 


Clearly the target response u ta rget is the direct analog of the “data” in Equa¬ 
tion flTJ) and the role of marginal likelihood or model evidence term p(data) is 


played by the normalization constant Z 


16|. We make use of this connection 


frequently to motivate the modeling choices made, particularly with regards to 
the regularizations or priors which are terms that we use interchangeably. 

The inference task in such a case would be formidable given the high dimen¬ 
sionality of 6, z and the cost associated with each evaluation of U as previously 
discussed. To address this, we propose the following decomposition of the design 
variables 2 e 


P dz ■ 


* = 02 


VV^ y 


Iz 


(9) 


dzXl d z X 1 d z xd y d xl dzXl 


The motivation behind such a decomposition is quite intuitive as it resembles a 
Principal Component Analysis (PCA) model 38]. The vector p, z captures the 
central/mean value of z, y are the reduced (and latent) coordinates of 2 along 
the linear subspace spanned by the d y columns of the matrix W and rj z the 
residual “noise”. As in PCA, the premise is that a few y i.e. dy« d Z suffice 
to capture the density of 2 . In contrast though with PCA where the reduced 
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coordinates are associated with the principal directions of largest variance, the y 
employed here should do the exact opposite i.e. identify directions with smallest 
variance that imply largest sensitivity. We explain this in more detail in the next 
Section. 

The linear decomposition of a high-dimensional vector such as 2 has received 
a lot of attention in several different fields. Most commonly z represents a high¬ 
dimensional signal (e.g. an image, an audio/video recording) and W consists 


32 


40] which attempts to encode the 


of an over- or under-complete basis set 
signal as sparsely as possible. Significant advances in Compressed Sensing 


or Sparse Bayesian Learning 


4lJ 


42] have been achieved in recent years along these 


lines. A host of deterministic 


43] or probabilistic H algorithms have been 


developed for identifying the reduced-coordinates y as well as techniques for 
learning the most appropriate set of basis W (dictionary learning) i.e. the one 
that can lead to the sparsest possible representation. 

We adopt a simpler representation for the input random variables 6 e 


0 = Ve 


Ve 


( 10 ) 


,xl 


) X 1 dg X 1 


the usefulness of which will become apparent in the sequel. In a fully probabilistic 
setting all the aforementioned parameters (y,r] z ,rj g , y, z ,W, y, g ) and the corre¬ 
sponding densities arising from Equation ([6]) would be sought. Such an inference 
problem would in general be formidable particularly with regards to y, zl W, whose 
dimension is dominated by d z » 1. To address this difficulty we propose com¬ 
puting point estimates for p, z , W, fi g while quantifying the appropriate densities 
for y,ri z ,r] g . We distinguish therefore between: 
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the latent variables y, rj z , rj e . 


• and model parameters R = {y z , W, y g }. 

The computation of appropriate distributions for the latent variables and point 
estimates for R will be addressed simultaneously under the VB framework dis¬ 
cussed in the sequel. 

2.2. Variational Bayesian approximation 

Given the re-parametrization of the primal variables Q,z in Equations ([9]), 

(TTOl) . one can write the target auxiliary density as: 

U(n 9 + 'n e ,n z + Wy + 'n z )p 0 (n g + T] e )p y {y)p Vz (ri z )p l , z (n z )pw(W) 

Paux{y, v z , r)o , JrC) = - — - 

( 11 ) 

where in place of the regularization p z (z) on z we employ regularizations (priors) 
on the corresponding parameters y, rj z and y z ,W (Equation ((9]) ). As discussed 
earlier rather than approximating the whole p aux which would pose significant 
difficulties, we seek point estimates for R by maximizing the (marginal) density 

Paux (-^) ■ 

Paux{R) = I Paux (l/j V z , Ve, R ) dy drj z drj e (12) 

Such a maximization would amount to an analog of Maximum-A-Posteriori 
(MAP) estimates in a Bayesian setting. 

To that end, for any density q(y,Ti z ,rf g ) on the latent variables and by em- 
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ploying Jensen’s inequality we obtain that 


S: 


\ogPaux(R) = log Jpauxiy, ij z , r) e , R ) dy dy z drj e 

= log f q(y, l z , rjo) PaU gty%™j R) dy dr]z dr]d 
> / q(y, Vzi 'He) log Pau q(y£2) R) dy driz dr)e 


(13) 


= J r (q(y,'n z ,'ne),R) 


The variational lower bound T given above has an intimate connection with 
the KL-divergence between q(y, rj z , rj g ) and the conditional density p aux (y, y z , rj g \R) 
pau X (y,v^,ve, R '> can b e expressed as: 


Paux (-R) 


0 < KL(q(y,y z ,y e )\\p aux (y,y z ,y e \R)) = -E q 

= -E n 


J og Paux(y.y z ,Vy\R) 

s q{y,v z <ve) 

Paux(y.y z ,y e -R) 

p a ux(R)q{y.r) z ,ye ) 


= log Paux(-R) - J 7 (q(y.i} z , i) 9 ), R} 

(14) 

We note that when q(y,rj z ,ri 9 ) = p aU x(y, y z , r le\R) the KL-divergence attains 
its minimum value 0, while T attains its maximum value with respect to q (given 
R = (y, z ,W, Hq)) and becomes equal to log p aux (R). On the other hand the 
poorer the approximation that q(y,'n z ,'ne ) provides to p aux (y, il z : VelR)’ the 
larger the KL-divergence and the smaller T (as a function of q) becomes. 

The aforementioned discussion suggests an iterative optimization scheme that 
resembles the Variational Bayes - Expectation-Maximization (VB-EM) methods 
that have appeared in Machine Learning literature |26|. At each iteration t, one 
alternates between (Figure [T]): 
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log Paux{R {t) ) 



1 KL(qM\\p aveB (.\R®)) 


log p aux (R {t 1} ) 


log Paux(R^ ^) 


F(qV,RW) 
KL(q®\\p aux (.\R (t - 1} )) 


F^qW, R^-V 


KL{q^- l )\\p aux {.\R I*" 1 ))) 


F(q( t - 1 \R( t ~ 1 '>) 


VB-E-step 


VB-M-step 


Figure 1: During the VB-E step, optimization with respect to the approximating 
distribution q takes place, whereas during the VB-M step, T is optimized with 
respect to the model parameters R (adapted from |45|) 


• VB-Expectation: Given R {t 1 \ find: 


= fKgmax.J r (q(y,ri z ,ri 0 ),R ( ' t 1} ) (15) 

9 

• VB-Maximization: Given q ( ' t \y 1 r] z , rj g ), find: 

R {t) = argma x.F{q®{y,ri z ,ri 9 ),R) (16) 

Ft 
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q(Uj Vzi Vo) 

Figure 2: Variational Bayesian Expectation-Maximization (VB-EM, 


In plain terms, the strategy advocated in order to carry out the inference task 
explained can be described as a generalized coordinate ascent with regards to T 

(Figure ED- 

2.3. Approximations 

The variational lower bound T (Equation (TT3D ) is the objective function in 
the proposed scheme. In this Section we discuss its form for the utility function 
of interest (Equation ([2])) and an isotropic Q = TQl n . Furthermore we discuss 
necessary approximations that enable the VB-EM steps. We defer discussions on 
the validity and quantitative assessment of these approximations for Section [2~6l 
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In particular, from Equations (fTTH and (fT3lK we have: 


E{q{y, r] z , rig), R) = f q(y, r/ z , y g ) log Pa ™(™ z ™{ R) dy dr} z dr ti 
= E, 


q(y,Vz’Vo) u ' y U " ,J: u " ,e 
U(^e+Ve<t J ‘z+ w y+Vz)Pe(tJ-e+Ve)Pv(y)Pvz (Vz)Pi*z(Vz)Pw{'W) 


Z q{y,Vz,Ve) 

— E q [U(fj, g + rjg, y, z + Wy + rj z )\ + E q 

+ logp^(// 2 ) + logp w (W) 

= EU + Ereg + log p^ z (H z ) + log p W (W) 


Pe(y-9+Vg)Py(y)Pv 

q(y,v z ,ve) 


(17) 


where we distinguish the individual terms: 


Tu = E q [U(n e + rj g ,ii z + Wy + rj g )] 


Ereg Eq 


Pe(^e + Ve)Py(y)PyAv z ) 


(18) 


(19) 


Q(y,^vo) 

We note that in the aforementioned expressions we omit logZ as this does not 
depend on q nor R and therefore does not affect any of the VB-EM results. 
Furthermore, we note that: 


Eu = E q [U(ng + rig,n z + Wy + r] z )\ 

= -ELE q [\u target - u(n e + rj g ,n z + Wy + rj z 


( 20 ) 


is not only analytically intractable but also poses significant difficulties due to 
the computational expense associated with each forward call for the evaluation of 
u(fi g + t] g , y, z + Wy + rj z ). To alleviate these issues we propose a linearization 


(Vz) 
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of the output vector u around for 0 = y, g and z = fx z . In particular: 


u (Ve + le^z + w V + V z ) ~ u (Ve, M 2 ) + G g rj g + G z {Wy + rj z ) 

= M 2 ) + G e rj e + G z Wy + G 2 T7 z 

( 21 ) 

where G g = ||,G~ = evaluated at (n g ,y, z ). These derivatives can be 
computed using adjoint formulations when the forward model is a system of 
PDEs as in the examples considered in Section 3. Such a linearization will lead 
to a quadratic, Gauss-Newton-type, expression upon substitution in the log-utility 
function: 

I ^target ~ u(fj. g + rj g , \i z + Wy + y z )\ 2 « \u target - u(y g , /xj | 2 + r^G T g G g r] e 

+y T W T G T z G z Wy + rgG%G t T, z 

‘2{uf ar g e i /x,.)) ( G g Tj g T G : H y T G z iJz) 

+2 njGjG-Wy + n T ,G T ,(Ge V » + G,Wy) 

( 22 ) 

We note here that a quadratic approximation could also be obtained using a 2 nd 
order Taylor series expansion of \u ta rget — xx(/x e + rj g , /x_ + Wy + ij z ) | 2 directly. 

This would require the computation of the Hessian matrix which is also possible 
using adjoint formulations albeit at a significant additional cost [1], Furthermore, 
for very large d g ,d z » 1 the storage of the Hessian might be impractical. The 
reason for the quadratic approximation advocated is that it leads to closed-form 
expressions for the density q in the VB-Expectation step (Equation (TT51D as it 
will become apparent in Section IZ5l Higher-order approximations would also be 
suitable as long as the latter requirement is satisfied. 

A quadratic expression can be obtained by a 2 nd -order Taylor series expansion 
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of log pe(Hg + Ve) aroun d without significant cost (most often than not, 
analytically) i.e.: 

log Pe(Pe + Vo) ~ log p e (fi e ) + Vo ^qq 6 \^ e + ^ ^qqqqt \n B Ve (23) 

In the case that pe(0) is a multivariate Gaussian as in the examples considered 
i.e. J\f(n eo ,Co 0 ), then the quadratic expression is exact and attains the form: 


log P»(m« + Vt) = -|(m» + v>- /'».! i r C'm ( Mo + % - Meo) 


(24) 


2.4. Prior specification for latent variables and model parameters 

The latent, reduced coordinates y e capture the variation of z around its 
mean p, z along the directions of W as implied by Equation (EJ). It is therefore 
reasonable to assume that, a priori, these should have zero mean and should 


be uncorrelated 


38]. For that purpose we adopt a multivariate Gaussian prior 


(denoted by p y (y) in the Equations of the previous section) with a diagonal 
covariance denoted by C y o = diagir^l),! — 1,... d y . In the examples presented 
in Section [3] r 0jl are set to the same value r y0 i.e.: 


°y0 = Pjl dy (25) 

Alternatively, one can select r^ 1 < r^ 1 < • • • Tq"J which induces a stochastic 
ordering of the reduced coordinates y since 2 is invariant to permutations of the 
entries of the y and the columns of W (Equation ([9])). 

The remaining latent variables r] z account for the part of 2 that is not cap¬ 
tured by y z + Wy (Equation (]£])) and should therefore account for the variance 
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in the subspace orthogonal to TV. The premise in the formulation advocated 
(Figure \3) is that y capture the most sensitive directions (locally) around p, z 
which are much smaller in number than the dimensionality of z i.e. dy « d z . 
The variance of y should therefore be the smallest amongst all possible di¬ 
rections. The remaining directions where the variance is much larger should be 
captured by rj z . We use therefore an isotropic Gaussian as a “prior" for rj z i.e. 
P Vz {t) z ) /s A/"(0, t z{ ) (I — WW 1 )) 0 where the prior variance is set much larger 
than the prior variances of y e.g.: 

r -i 

1 _ TjO_ 

T z0 - e 2 



where e 2 << 1 (Section^). We point out that this is the premise invoked also in 


the context of Sloppy Models [35, 


36 


37], whose behavior depends only on a few 


stiff combinations of parameters (accounted here by TV andy), with many sloppy 
parameter directions largely unimportant for model predictions (accounted here 
by T) z ). We also note here the fundamental difference with PCA decompositions 
which attain the same form as Equation^ . In PCA, TV and the latent variables 
y capture the directions of largest variance and rj z account for the remaining 
variance which is isotropic, smaller, and superimposed on the directions TV. 

With regards to the regularization (prior) specification pw{TV) on TV we 
note that its d y columns Wi, i = 1 ,...d y span the subspace over which an 
approximation of z is sought. We note that z depends on the product TVy 
which would remain invariant by appropriate rescaling of each pair of w\ = ay w { 


5 The covariance (I —WW T ) is obviously improper as it has d y zero eigenvalues to reflect 
the fact that r/ z is inherently [d z — c? !/ )-dimensional 
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and y[ = -Ey i for any ay. Hence, to resolve identifiability issues we require that 
W is orthogonal i.e. W 1 W = I dy where I dy is the d y —dimensional identity 
matrix. This is equivalent to employing a uniform prior on W on the Stiefel 
manifold V dy (R d ») 0. 

The final aspect of the prior model pertains to p gz {pi z ). As this is closely 
related to the physical meaning of the design variables z we make this specific 
for each of the examples considered in Section 3. 


2.5. VB- Expectation-Maximization: Update equations for q(r ] g , y. rj z ) and R 
We consider Gaussian families of approximating distributions q{'r] e ,y,r] z ) 
with the following mean and covariance characteristics: 


E q [r)e\ = 0 


E q [y\ = 0 


_ E q [r) z \ = 0 _ 



EqiVeVj] = C ee E q [r] g y T ] = C dy 
EgivVe] = c e y E q [yy T ] = C yy 
E q [VzVe) = 0 E q [rj z y T ] = 0 


This form is postulated for the following reasons: 


E q [rje^} = 

E q \yr£} = 

Eq[VzVl] = r-\l - 
(27) 


• r] e expresses variations of 6 from its mean pi e (Equation (fTOlH and should 
therefore have a mean zero. 


• y and rj z express variations of z from its mean /x, (Equation ([9])) and 
should also have a mean zero. 


• rj z expresses residual variation (noise) of z from pr, + Wy. Apart from 
having a zero mean, it is assumed to be uncorrelated with r] g as well as y. 

• r) z accounts for variance in the subspace orthogonal to W. Along this it is 
assumed that the variance is isotropic and equal to tT 1 (to be determined) 


0 

0 

- WW T ) 
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Before embarking in the presentation of the expressions for the aforemen¬ 
tioned parameters, we note that q provides an approximation to p aux and there¬ 
fore its marginal with respect to y,rj z can be used to approximate the marginal 
on z i.e. p aux (z ) which is proportional to the expected utility V(z). We note 
that based on Equation (1271) . the marginal q(y,rj z ) will also be a Gaussian and 
there the approximate p aux ( z ) will be a Gaussian with the following mean and 
covariance: 

E[z\ = n z , E[zz t } = C zz = WC yy W T + T~\I - WW T ) (28) 

We can therefore approximate (up to a multiplicative constant) the expected 
utility V(z) as: 

V{z) « V{n 2 + Wy + rj z ) oc (29) 

Let Wj, j — 1,..., d z denote the eigenvectors of C zz and o\ < < ... < 

cr^ the corresponding eigenvalues in ascending order (FigurelH). Consider (local) 
variations A Zj of z from y, z along the distinct directions Wj i.e.: 

A Zj = z — n z = a Wj (30) 


Then: 

1 oP 1 1 oP 1 Q/^ 

f/(Azi) oc e < V(Az 2 ) oc e 2 ^2 < ... < V(Az dz ) oc e (31) 

Hence the expected utility of competing designs Zj will decrease faster for varia¬ 
tions along directions with the smaller variances/eigenvalues which represent the 
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directions of higher sensitivity. 

The methodology developed is based on the postulate that the number 
of sensitive directions is small compared to d z and can be captured by the 
d y « d z latent variables y and the vectors in W. Most of the remaining 
directions are assumed to be sloppy i.e. have a much higher variance and can 
be represented by rj z . The d y sensitive directions can be found by diagonalizing 
C yy = U diag{a\^ dy ) U 1 and as a result: 


W 


W i W 2 ... W dy 


wu 


(32) 


From the expressions of Tu,T reg in Equations (fT71) (TT81E (fT9l> and the ap¬ 
proximations in Equations (1221) and (1241) . we obtain (up to additive constants): 




- r -Y (|Utarget ~ u(y g , n z )\ 2 (from E q [U}) 

+ G T qGq : Cee + W T G T Z G Z W : C yy + r^G^G, : (I - WW T ) 


+ 2 G T e G z W : C ey ) 

~\{^e ~ l^eo) 7 ^C qo (Ho — Meo) 

T yQ T . f ' 

2 ± • ^yy 

_ <lz—dy T z Q 

2 T z 

Cge Cg y 

Cl Cyy 
T^log T Z 

+ logp Mz (/iJ +logp w (W) 


+ |log 


(from E q \p e ]) 

(from E q \p(y)]) 
(from E q \p(r] J]) 

(from E q [q(ri 0 ,y)]) 

(from E q [q(rj z )]) 

(33) 
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Figure 3: Illustration in two dimensions (d z = 2) of p aU x(z) (which approximates 
expected utility V(z)) in the vicinity of (local) maximum /tz z . The most sensi¬ 
tive/stiff directions are captured by W and their variance o\ is accounted by y. 
The most insensitive/sloppy directions are orthogonal to W (along W L ) and 
their variance o\ >> af is accounted by rj z . 

The iterative VB-EM optimization with regards to q and R (Figure [2]) pro¬ 
ceeds then as follows: 

• VB-Expectation: Given the current R find the optimal q (i.e. the optimal 

CfjQi C@y, Cyy , ) . 


f~lOpt s-iopt 

*-'00 '-'dy 

-l 

T Q G T e G e + Cqq 

r Q G T e G z W 

sym. C\% 


sym. 

r Q W T G^G-W + T y0 I 


(34) 
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and: 


T?‘ = r«„ + _l_T Q Gr G Z :(I- WW T ) (35) 

z y 

• VB-Maximization: Given the current q (i.e. C g g, C dy , C yy , r 2 ), find the 
optimal R = {n z , W, fi e }. To carry out this task, it suffices to consider 
only the terms of T in Equation (1331) . that depend on the parameters of 
interest i.e.: 

n° e pt ) = arg max (M*,Mo), = arg max T w (W) (36) 

fJ, z ,Vg W 


where: 


l^o) ~~ 2 (I u target u (f J '9i f^z) I ) 


-|(M» “ MoogCso'tMo - Moo) + lo 8Po.(M s 


(37) 


and: 


^W(W) = - t -§-W t G t z G z W : - fr- ] GlG 2 

- T -§-2G T e G z W : C 0y + logM^) 


= _a W T G r GrW r :(Cw _ TrlJ) 
- T -f2G T e G z W : C 0y + logp w (W) 


Some remarks are warranted at this stage: 


(I-WW 1 


(38) 


The maximization of with respect to (/x 0 , /z z ) can be carried out using 


any nonlinear optimization scheme. We discuss in |Appendix B| a Gauss- 
Newton-type scheme which requires only first-order derivatives of u. We 
note that this is the only part of the VB-EM scheme proposed that requires 
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calls to the forward solver for the computation of u and its derivatives. 
Hence in the context of large-scale, complex models this step controls, to 
a large extent, the overall cost of the proposed algorithm. 


• The updates of (pi g , /x,) in Equation (1371) are decoupled from the rest 
i.e. q and W. This is a direct consequence of the assumption on q 
that E q [y\ = E q [rj z \ = E q [rj e ] = 0 (Equation (1271) ) which was described 
earlier. As a result, the optimal (/x 0 , /jl z ) can be computed beforehand and 
the rest of the VB-EM steps would involve only iterative updates of q (i.e. 
C 0 (), C()y, Cyy, T Z ) and W 


The optimization of Fw with regards to the orthogonal matrix W requires 
appropriate nonlinear constrained optimization tools. A highly efficient 


such tool is discussed in |Appendix A1 We reiterate that this step does not 
require any further calls to the forward solver. 


We also point out that the proposed algorithm inherits all the 
traits of Expectation-Maximization algorithms as discussed in 


avorable 


47]. As 


explained therein it suffices that the updates for q (VB-E-step) or R (VB- 
M-step) lead to an improvement of the variational bound F rather than 
being the locally optimally values. This would for example allow for only 
partial updates of q (e.g. updating Coe only every few iterations) or incre¬ 
mental improvements of W that simply lead to an increase in Fw without 
finding the local maximum. Such strategies could expedite significantly 
the computations involved. 


• Finally we note that implicit to the aforementioned derivations is the as¬ 
sumption of a unimodal density on the latent variables and as a result 
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a unique, global maximum for z (Equation (I28D ). This assumption can 
be relaxed by employing a mixture of Gaussians (e.g. 48]) that will en¬ 

able the approximation of highly non-Gaussian and potentially multi-modal 
Paux which in turn can reveal multiple local maxima of the expected utility 
V(z). Such approximations could also be combined with the employment 
of different basis sets W for each of the mixture component i.e. differ¬ 
ent sensitive/sloppy directions for each local optimum. We defer further 
discussions along these lines to future work. 


2.6. Validation - Assessing the accuracy of approximations 

Thus far we have employed the variational lower bound in order to identify 
the optimal dimensionality reduction and to infer the latent variables that approx¬ 
imate Paux{z) (Equation (125} ) and the expected utility V(z) (Equation (I29D V 
The goal in this section is to propose quantitative indicators that assess the ac¬ 
curacy of the VB approximation. To that end we consider the Kullback-Leibler 
divergence KL(q(y,r) z ,ri e )\\p aux {y,r) z ,ri e \R)) (Equation flU}) that motivated 
the VB-EM scheme discussed. In particular: 


KL(q(y,r] z ,rj g )\\p aux (y,ri z ,'n d \R)) = -E q 

= -E n 


j og . paux(y.'n : -Vo\R) 
q(y,Vz,vo) 

Paux(y,V^Vy.R) 

Paux {R)q{y,Vz^ve) _ 

Paux{y-y z -ye- R ) 

Paux{R)q(y,v z ,vg) 

(39) 


= log p aux (R) - E, 


where p aux {y, "Hz? Vei R) ' s given in Equation (HID and log p aux (R) in Equation 
(112} . We propose estimating both terms in Equation (|39D using Importance 
Sampling with q(y, r] z , rj g ) as the Importance Sampling density |2jj]. If we denote 
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by: 


(40) 


w{y,Vz,Ve) 


Paux(y,-n 2 ,rj g ,R) 

viy^ziVe) 


the (un-normalized) importance weights, then by drawing samples {y^ m \ rfg H ' > }^ =1 

from q(y, r] z ,r ] g ) we obtain that: 


M 


log < w >= log ( — ™(y (m \ y { e n) ) ) —> io gp aux (R ) (4i) 


m= 1 


and: 


M 


<logw>=—J2 i °S'"'(y <m) . V™) — t- E, 


m= 1 


log 


Paux(y,y z ,Ve^ R ) 


q(y,riz,Vo) 


(42) 


In summary, by employing Importance Sampling we can estimate: 


KL(q(y,Vzi'ne)\\Paux(y,y z ,y e \ R )) ~log < w > - < logw > (43) 

We note that sampling from q(y, y z , rj e ) is straightforward due to its Gaussian 
form but the evaluation of the weights require the computation of the actual 
utility i.e. running the exact forward model. We point out however that this is 
done solely for the purposes of validation. Given that the JlL-divergence is not 
bounded from above and in order to compare it when considering various values 
of d y i.e. the dimension of the reduced coordinates y, we propose normalizing 
it with the entropy H(q ) of the multivariate Gaussian q(y,T) z ,r) g ) which can be 
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exactly computed as: 


H{q) = - dd + dy log2vr - i log 



In the examples that follow we report therefore the following normalized KL- 
divergence: 


KL(q(y,ri 2 ,ri g )\\p aux (^JJzdnelfy) 

H(q ) 


(45) 


nKL 


In all the expressions above we use R opt as found at the last iteration of the 
VB-EM scheme. 

3. Numerical Illustrations 

In this section we discuss the numerical results obtained in the analysis of 
two examples. In both cases the forward model consisted of an elliptic PDE. 
The discretization of the forward problem for the computation of outputs u as 
well as of the adjoint problem for the computation of the derivatives Go, G z 
was performed using standard finite element tools. Both problems involved a 
very high number of random variables dg arising from the discretization of a 
random field with small correlation length (in relation to the problem domain). 
Especially the second example involved a very large number of design variables 
d z . A summary of the basic dimensions/quantities is contained in Table [1] 

With regards to the regularization (prior) terms, in both problems we em¬ 
ployed t~q = 10 4 (Equation (|25ll ) and e 2 = 10“ 10 (Equation (|26il ). Details 
about the p^, z {li z ) are given for each example separately. With regards to the 
VB-EM scheme employed we note that at each iteration, 100 W —updates were 
performed according to the equations detailed in|Appendix A 
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Random Variables 6 

Design Variables z 

r Q 

n = dimiutarget) 

Num. Illustration 1 

d e = 1600 

d z = 21 

0.01 

11 

Num. Illustration 2 

d e = 3536 

d z = 3536 

5 x 10“ 6 

8 


Table 1: Basic quantities/dimensions 


3.1. Numerical Illustration 1 

The goal of this problem is to optimally select the input to a random, hetero¬ 
geneous medium so as to maximize an expected utility related to the response. 
In particular we consider the rectangular domain = [—1,1] x [0,1] of Figured] 
and the steady-state heat diffusion with a governing PDE: 

V • ( — A(a3)V-u(a;)) =0, x e int(Yi) (46) 

The boundary conditions are u = 0 on Yu, —A(x) dT d (x) = 0 on Tat. The design 
variables z parametrize the flux on the left hand boundary. 

Figure 4: Problem Configuration for Numerical Illustration 1 

The uncertainties 6 parametrize the conductivity field A(:r). In particular we 
consider a statistically-homogeneous, log-normally-distributed random field with 
mean 1 and coefficient of variation 0.50. This is defined through a transformation 
of a statistically-homogeneous Gaussian field X g (x) as: 

X(x) = e Xa{x) (47) 

The following autocovariance C g {Ax\,Ax 2 ) for A 9 (;e) is employed: 

C g ( Ax 1 ,Ax 2 ) = ffg exp{— Xl + A cTg = 0.223 (48) 

Xq 
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(i) sample 1 
(iii) sample 3 


(ii) sample 2 
(iv) sample 4 


Figure 5: Sample realizations of the conductivity (example 1) - Young’s modulus 
(example 2) field A(x) as prescribed in Equation (14711 


where a correlation length of x 0 = 0.1 is used. We note that the correlation 
length is small in relation to the dimensions of the problem domain and as a result 
a large number of random variables 0 are required. In particular we discretize 
the problem domain into 1600 triangular, finite elements E and model with 0 the 
value of \ g (x ) at the centroid of each element. This gives rise to dg = 1600 
and a pg (Equation (EH)) with mean fi eo = —0.112, variance a 2 = 0.223 and 
covariance matrix Cg o obtained from Equation (1481) . Sample realizations of the 
conductivity field X(x) are depicted in Figure [5] for illustrative purposes. 

We employ a design variable z for each node along the left-hand boundary 
of the problem domain (Figured]) resulting in d z = 21 design variables. Finally, 
with regards to the utility function U, we use temperatures along X\ = 0,£ 2 G 
[—.25,0.75] (red line in Figured]) and in particular at 11 equidistant points with 
x 2 ,fc = 0.25 + 0.05(fc — 1), k — 1 -G 11. The target temperature vector u target is 
set to: 


^target, k 20 40|x 2 ,fc 0.o| 


(49) 


and Tq 1 = 0.01 (Equation (|8])). We finally note that a vague Gaussian regular¬ 
ization/prior was employed for /x 2 such that PimzM = -^(O. °zo = 10 10 J). 
Figure |6] depicts the computed [i z as a function of the number of iterations 


6 we consider a 40 x 20 regular grid and each rectangle is divided along its diagonal into 
two triangles 
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Figure 6: Computed ^i z (see |Appendix BQ as a function of the iteration number. 
In example 1 this expresses the flux on the left boundary X\ = 0, x 2 6 [0,1]. 
Each iteration involves a forward call for the computation of the output u and 
its derivatives. 


( Appendix B ). As it can be seen, convergence is attained with as few as 20 
forward calls. We re-emphasize that these are the only forward solutions required 
for the computation of the outputs and their derivatives. We note that while 
the linearization in Equation (I2TD with respect to z is exact, this is not the case 
with regards to the random variables 6. This is due primarily to the nonlinear 
dependence of the response on the conductivity field A(a?). 

The evolution of the the variational lower-bound T (Equation (1331) ) with 
regards to the iterations alternating between q and W updates is shown in Figure 
I7T1 We note that these iterations do not entail any additional forward calls. Figure 
[TTT] depicts the evolution of the identified crj per VB-EM iteration where as it is 
clearly seen, there exist 3 “stiff” generalized eigenvectors with small values for the 
corresponding generalized eigenvalues. One also notes that the variances top-off 
at the prior value r^ 1 = 10 4 . These 3 most sensitive generalized eigenvectors 
W (Equation (132D ) and the associated variances are shown in Figure [8] The 
numbers in parentheses were the computed variances when the calculation was 
repeated for exactly the same problem but by assuming a coefficient of variation 
of 0.71 = \/o75 (instead of 0.50) for the conductivity field A(£e). The most 
sensitive eigenvectors were identical (and therefore not plotted) but, as expected, 
their sensitivity is reduced or equivalently the corresponding variances were larger. 
Figure [9] compares the [i z computed for these two cases where one notes that 
while the shape is the same the amplitude/range is different. 
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(i) Evolution of T (Equation (I33|I V 


(ii) Evolution of a 2 


Figure 7: VB-EM Each iteration corresponds to one q (Equations (l34lf . (l35ll ) 
and one W (Equation (I38D ) update 


(i) a? = 4.0 x 10" 2 
(5.9 x 10" 2 ) 


(ii) a\ = 1.5 x 10 3 
(1.6 x 10 3 ) 


(iii) cr 2 = 6.6 x 10 3 
(6.8 x 10 3 ) 


Figure 8: First three most sensitive eigenvectors {i& J } 3 =1 (Equation (I32D ) and 
associated variances cr 2 . We note that cr 2 /cr 2 = C>(10 5 ) 

Figure [10] depicts sample designs drawn from q([i z + Wy) (which approx¬ 
imates the expected utility V(/jl z + Wy)) corresponding to different (relative) 
levels of the the expected utility. While in the approximation advocated /u z rep¬ 
resents the optimal design for which V(z) attains its (locally) maximum value, 
by considering expected utility values V(z) less than the optimal we can identify 
an infinity of alternative designs but also assess the sensitivity of the solution. 

Finally in Table [2] we record the normalized KL-divergence as discussed in 
Section IZ6l and note that this decays for increasing d y to relatively small values 
indicating a good quality in the approximation found. 

3.2. Numerical Illustration 2: Stochastic Topology Optimization 

The vast majority of studies in the context of stochastic topology optimization 
consider uncertainties in the loads (i.e. input) of linear systems Qj. This 
allows one to find closed-form expressions for the random response and perform 
the integrations needed much more easily. Recently notable efforts have been 

Figure 9: Comparison of /jl z computed when the conductivity field A(a;) has a 
coefficient of variation (cov) of 0.50 and 0.71. 
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<") ngr = 


(i) 


V(z) 


0.95 


C" 1 ) ptSt = 050 


H V $) = °- 25 


Figure 10: Alternative designs z at various levels of expected utility as 

compared to the optimal [i z 


dy 

nKL (Equation (l45l)[) 


1 

1.5 x 10” 1 


2 

1.2 x 10- 1 


5 

4.7 x 10~ 2 


10 

2.5 x 10~ 2 


20 

9.8 x 10~ 3 



Table 2: Normalized KL-divergence from Equation (l45l) for example 1 


made towards addressing the significantly more complicated problem involving 
geometric and/or material uncertainties j^c]]. Some of the proposed solution 
strategies employed perturbations techniques jjy], 521 whose performance decays 
as the random variability around the mean and/or the number of random variables 

n 

increases . Other attempts have made use of intrusive |53| and non-intrusive 
[ 54 . 551 versions of (generalized) Polynomial Chaos (gPC) in order to address 
the stochastic components. 

We consider the two-dimensional domain Q = [0,1.6] x [0,1] in Figure [1171 
The goal is to identify where the material of interest should be placed in order 
to achieve the objectives (subject to appropriate constraints) to be discussed. 
We can therefore partition Q into which contains all points where material 
is placed and O 0 = O Oi which corresponds to the points without any material 
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(void). The governing differential is that of elastostatics: 


V • (D(x)e(u(x))) = 0, x G int(Q) 


e(u(x)) = 


du\ 

dx\ 

du2 

8x2 

du\ | du2 
dx2 ' dxi 


(50) 


where u(x) = 


u 1 (x 1 ,x 2 ) 

u 2 (x 1 ,x 2 ) 


is the displacement field, D is the (plane-stress) 


elasticity matri 


ixEl i.e. 


D(x) = 


0 0 1 - v 

modulus. Its spatial variation can be modeled as: 


1 v 0 

v 1 0 


and E(x) is the Young’s 


0 if x G f2 0 

E(x^) -Emin lfli(*)(^(*) E'min)) lf2i(*®) \ (51) 

1 if x G 


The value of E min = 10~ 10 (instead of 0) is used to avoid numerical issues in 
the solution of the governing equations. With regards to boundary conditions it 
is assumed that u = 0 along T D and traction-free along T N (Figure [TlT]) with 
the exception of a point force P = 10~ 3 at {x\ = 1.6, x 2 = 0). 

In deterministic formulations, X(x) is assumed constant. In the context of 
the analysis pursued in this study we are interested in exploring the case where 
X(x) not only varies spatially but also exhibits stochastic variability i.e. A(a?) is 
a random field. The model adopted for X(x) is identical to that in Example 1 


7 v = 0.3 (constant) in this study 
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which we repeat here for completeness. In particular we define A(a:) through a 
transformation of a statistically-homogeneous, Gaussian random field X g (x) as 
in Equation (1471) . The latter has a mean (constant) /i g = —0.112 and autoco¬ 
variance C g (A xi,Ax 2 ) as prescribed in Equation (f4Sll with a correlation length 
Xq = 0.1 and a variance a 2 g = 0.223. This gives rise to a log-normally distributed 
A(a;) with mean 1 and coefficient of variation 0.50. 

The problem domain Q is discretized using a regular mesh of 3536 triangular 
elements [®|. The vector of random variables 6 represents the values of \ g {x) at 
the centroid of each element. This gives rise to do = 3536 and a po (Equation 
flMD) with mean /x eo = —0.112, variance a 2 = 0.223 and covariance matrix C 


60 


obtained from Equation (1481) . We note that, as in Example 1, a small correlation 
length is selected giving rise to a large number of random variables 6. 

Normally the design variables 2 should be binary and discretize the indicator 


function 

schemes 


i2i 


x) in Equation (I5TD []. As in deterministic topology optimization 
58] and in order to be able to compute meaningful derivatives with 


respect to the design variables we adopt a relaxation of the problem. In order to 
represent the variations of the elastic modulus E(x) (Equation (l5lD ). we employ 
the sigmoid function to transform a real-valued field z(x) as follows: 


E(x) = E„ 


1 + e _2< AA 


(A(*) - E n 


(52) 


While the sigmoid function ensures that E(x) € [E min , X(x)] as in Equation 


°we consider a 52 x 34 regular grid and each rectangle is divided along its diagonal into 
two triangles 

9 We note that in deterministic formulations level-set-based representation have also been 
adopted e.g. |56, 57] 
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(l5lH it does not necessarily yield a hard partitioning (0 — 1) of Q as required in 
such problems. To achieve this i.e. to promote solutions where z(x) —> —oo (i.e. 
E(x ) — > E min ) or z(x) —> Too (i.e. E{x) —> X(x)) we adopt an appropriate 


hierarchical prior/regularization p z (z) that is discussed in detail in Appendix C 


Naturally the vector of design variables 2 represents the values of the field z(x) 
at the centroid of each finite element (as we did for the random variables 6) 
resulting in d z = 3536 design variables (Table [T]). 

More importantly though the problem formulation is only meaningful with the 
introduction of a constraint on the volume of material that should be used i.e. 
the volume fraction VF = ■ This in turn implies an equality constraint 

for the design variables z which can be written as: 


1 


c z = 


tE 


d z I 1 + e z i 
3 =1 


VF = 0, 


(53) 


where VF is the targeted volume fraction^. In order to account for this nonlin¬ 
ear constraint in the proposed framework where the design variables z are treated 
as random variables, we propose expanding the target, auxiliary p aU x{0, z ) (Equa¬ 
tion ((6])) as follows: 




Paux{0,z)(xe ~^EU(G,z)p e (d) p z (z) 


(54) 


Clearly this represents a soft, probabilistic enforcement of the aforementioned 
constraint where for small e^, the target density p aux . ar| d therefore the associated 

10 ln the example considered, the area of each finite element is the same. If this does not 
hold, the constraint has to be adjusted appropriately without loss of generality 
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(ii) Deterministic solution for VF = 0.4 
(i) Problem configuration obtained by setting Var[0] = 0 

Figure 11: Problem domain, boundary conditions and deterministic solution 

z, are contained in the vicinity of the manifold implied by Equation (1531) . The 
additional term in p aux in Equation (1541) partially alters the associated update 
equations of the VB-EM scheme previously presented. We discuss these in detail 
as well. In the examples presented the value e 2 c = 10~ 10 was 

used. 

For the complete definition of the problem, we note that the target response 
vector ut arget consisted of the vertical displacement u 2 at 8 points along the 
bottom boundary i.e. with x 2 = 0 and x\ = 0.2 k, k — 1 -P 8 such that: 


in 


Appendix C 


^target,k 6.25 X 10 k 


(55) 


and Tq 1 = 5 x 10~ 6 (Equation ([8]) ). For comparison purposes, the deterministic 
problem was solved for VF = 0.4. To that end, the exact same algorithmic 
scheme for finding p, 6l p, z was employed (Appendix C) by assuming that the 
variance of the random variables 6 was zero and their mean exactly the same as 
detailed above. The resulting pr z which is shown in Figure fTTTTI was obtained after 
(approximately) 50 iterations and exhibits two diagonal ribs that are obviously 
critical in stiffening the system. As it is easily understood, the objective function 
is not (in general) concave and multiple local maxima could exist. 

Figure H2] depicts the estimated pi z for the stochastic problem and for two 
volume fractions considered i.e. VF = 0.4 and VF = 0.2. The first was 
obtained with 35 forward calls whereas the second with 54. As compared to 
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(i) VF = 0.4 


(ii) VF = 0.2 


Figure 12: Computed n z (see Appendix B and Appendix C ) as a function of the 
iteration number. Each iteration involves a forward call for the computation of 
the output u and its derivatives. For VF = 0.4 and VF = 0.2 the computation 
required 35 and 54 such calls respectively. 


(i) VF = 0.4 


(ii) VF = 0.2 


Figure 13: Evolution of T (Equation (I33D ). Each iteration corresponds to one q 
(Equations (J34j), (|35ll ) and one W (Equation (EH])) update 


the deterministic solution in Figure ITTiTI with the two diagonal stiffening ribs, 
one notes that in Figure [1271 only one is present. This could be attributed to a 
different local maximum or it could be the result of the random variability in the 
properties of the material. 

More importantly the algorithm proposed can identify the most sensitive di¬ 
rections around the local maximum. These are obtained through successive iter¬ 
ations between q (Equations (1341) . (1351) ) and W updates (Equation (138D ). The 
evolution of the the variational lower-bound T (Equation (13311 ) with regards to 
these iterations is depicted in Figure |T3] We note that these iterations do not en¬ 
tail any additional forward calls. Some of the generalized eigenvectors identified 
W (Equation (132D ) and the associated variances are shown in Figure [15] Due 
to the presence of the constraint, the first (most sensitive) such eigenvector is 
determined by the gradient of the constraint at n z and the associated variance of 
(in parentheses, Figure [T5D by the user-specified parameters e c (Equation (154D ). 
Figure [14] depicts the evolution of the identified Oj per VB-EM iteration where 
as it is clearly seen the first, most sensitive generalized eigenvectors are identified 
in the first few iterations. One also notes that the variances top-off at the prior 
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(i) VF = 0.4 


(ii) VF = 0.2 


Figure 14: Evolution of a 2 


(i) {aj = 7.31 x 10' 1 ) 

(ii) o\ = 1.25 x 10 2 

(iii) erf = 2.78 x 10 3 

(iv) crj = 1.36 x 10 4 


(v) {aj = 3.24 x 10°) 

(vi) a\ = 1.93 x 10 2 

(vii) erf = 1.91 X 10 3 
(viii) of = 1.99 x 10 4 


Figure 15: Generalized eigenvectors Wj for VF = 0.4 (left column) and VF = 
0.2 (right column) 


value t~q = 10 4 . 

Figure |T6] depicts the squared values ( Wj) 2 (shown in Figure fT5i) in a log- 
scale. This allows one to see how the sensitivity associated with each generalized 
eigenvector is spatially distributed. Finally Figure [18] depicts the outlines of 
sample designs drawn from q(/jL z + Wy) (which approximates the expected utility 
V{/j, z + Wy )) corresponding to different (relative) levels of the the expected 
utility. In the approximation advocated, fi z represents the optimal design for 
which V(z) attains its (locally) maximum value. By considering V(z) less than 
the optimal, we can identify an infinity of alternative designs but also assess the 
sensitivity of the solution. 

Finally in Table [3] we record the normalized KL-divergence as discussed in 
Section IZ6l and note that this decays for increasing d y to relatively small values 
indicating a good quality in the approximation found, particularly for VF = 0.4. 
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(i) (.X,!) 2 

(v) (wrf 

(ii) (w 2 ) 2 

(vi) ( W 2 ) 2 

(iii) (xu 5 ) 2 

(vii) (w 5 ) 2 

(iv) ( w 7 ) 2 

(viii) (w 9 f 


Figure 16: The squares of the entries of each of the generalized eigenvectors Wj 
(log scale) for VF = 0.4 (left column) and VF = 0.2 (right column) 


(i) 


YM 

V(vl z ) 


0.75 


00 


YM 

V(Pz) 


0.50 


(iii) 


YM 

V(p z ) 


0.25 


Figure 17: Outline of alternative designs z at various levels of expected utility 
as compared to the optimal /x, (VF = 0.4) 



nKL (Equation ((45])) 

dy 

VF = 0.4 

VF = 0.2 

5 

1.5 x 10~ 2 

3.4 x 10' 1 

10 

8.7 x 10" 3 

1.9 x 10” 1 

15 

3.9 x 10" 3 

1.3 x 10" 1 

20 

6.0 x 10” 4 

6.8 x 10" 2 


Table 3: Normalized KL-divergence from Equation (l45il for example 2 


4. Conclusions 

We present a framework for solving a large class of model-based, optimization- 
under-uncertainty problems. The overarching idea is that of recasting the prob¬ 
lem as one of probabilistic inference. This enables the uniform treatment of 
both random and design variables and is capable of furnishing not only a (local) 

(0 V$) = 0-75 (ii) = 0.50 (iii) igj = 0.25 

Figure 18: Outline of alternative designs z at various levels of expected utility 
yf~ z \ as compared to the optimal /x, (VF = 0.2) 
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maximum (i.e. a point estimate) but also the sensitivity of the objective to the 
design variables. To achieve this objective, we propose a Variational Bayesian 
framework that operates on two fronts. Firstly, it attempts to compute efficiently 
an accurate approximation of the joint density of interest. Secondly, it seeks a 
lower-dimensional subspace with regards to the design variables z that provides 
an assessment of the solutions robustness by discovering the most sensitive direc¬ 
tions i.e. the directions along which, variations in z will cause the largest decrease 
in the expected utility. This is based on the same premise as the so-called Sloppy 
Models whose behavior depends only on a few stiff combinations of parameters, 
with many sloppy parameter directions largely unimportant for model behav¬ 
ior. The identification of this lower-dimensional subspace, enables the analyst to 
compute, apart from the optimal design, an infinity of alternative designs which 
achieve a lower value of the expected utility. Interestingly enough, addressing the 
probabilistic inference task under the Variational Bayesian perspective involves 
the solution of an optimization problem. To that end we propose an iterative 
VB-Expectation-Maximization scheme. 

The aforementioned claims have been validated in the context of two nu¬ 
merical examples involving 0(1O 3 ) random and design variables. In all cases 
considered the cost of the computations in terms of calls to the forward model 
was of the order 0(10 10 2 ). The accuracy of the approximations provided is 

assessed by appropriate information-theoretic metrics. 

The framework proposed cannot currently account for the possibility of mul¬ 
tiple local maxima, as the approximation constructed is based on unimodal Gaus¬ 
sian densities. Nevertheless, the formulation can be readily extended by employ¬ 
ing mixture of Gaussians that will enable not only approximations for multi-modal 
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cases but also produce better results for unimodal, but highly non-Gaussian den¬ 
sities. We note finally the possibility of using approximate, surrogate or reduced- 
order models in order to expedite computations. All the algorithmic steps dis¬ 
cussed can be readily performed by using these less-expensive forward solvers. As 
long as these convey some information about the expensive, reference forward 
model, then they can provide a good starting point for further computations that 
would require fewer expensive calls to converge. 
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Appendix A. Maximization of Tw 

As discussed earlier, in order to update W it suffices to consider only Tw{W) 
(Equation (138D): 


T w (W) — —^-W G z G Z W : [C yy — r” 1 /) 
-f 2 GjG z W : C 0y + log Pw (W) 


(A.l) 


While the first part is quadratic with respect to W the difficulty arises from the 
orthogonality constraint W T W = I which can be enforced directly or through 
the regularization term pw(W) as previously discussed. To address this con¬ 
strained optimization problem, we employ the iterative algorithm proposed in 


59] which is highly efficient not only in terms of the number of iterations needed 


but also in terms of the the cost per iteration. It is based on the constraint¬ 
preserving Cayley transform according to which the current W is updated to 
W' as follows: 

n n 

(A.2) 


W' = {I+^A)-\l- a -A)W 


where: 


A = JW 1 - WJ 1 


(A.3) 


and J = ■ The latter can be readily obtained from Equation (IA.1D : 


J = -T Q G T z G z W(C yy - r- 1 /) - T Q G T z GeCoy (A.4) 


It can be shown that W' satisfies automatically the orthogonality constraint and 
that and for a = 0, W' is an ascent direction of Tw- Several options exist 
for selecting the step size a. In the numerical illustrations we made use of the 
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Barzilai-Borwein scheme detailed in 16 


which results in a non-monotone line 


search algorithm. We note that the inversion of the d z x d z matrix (I + %A) 
can be efficiently performed by inverting a matrix of dimension 2 d y x 2 d y which 
is much smaller than d z [59] ■ We finally re-emphasize that the updates of W 
require no forward calls. The updates/iterations are terminated when no further 
improvement to the objective Tw is possible. 

Appendix B. Maximization of 

As it was previously discussed, in order to update fi g , n z it suffices to consider 
only (Equation ([37])): 


Me) — 2 (I U target U {l l e^z)\ ) 

-\(Vo ~ Veo) Tc eo(Ve ~ Meo) + log 


(B.l) 


This represent a nonlinear, unconstrained optimization problem that can be 


solved with any of the well-known algorithms [61,162]. We present here a Gauss- 
Newton type algorithm that we employed and produced the results discussed 
in Section [3] For clarity of the presentation we consider first the case in the 
first numerical illustration where the regularization/prior was a Gaussian 

J\T(0,C z0 ) in which case: 


FniPziPe) = 


2 (| u target l l z)\ ) 

~\{^6 — — /%)) — \n 1 z C zQ H z 


(B-2) 
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If (fAz\Hg^) denote the values at iteration t and (//l f+1) = + A/j, < £\ /^ /+1) = 

+ A/z^), then a first-order Taylor series yields the following approximation: 


•7>(A/^\ A nf) « . |n tarffe i - - G e , t Atif - G® t A/x£ t} )| 2 ) 

-|(rf } + a /4* } - M0o) T C , 0 O 1 (m? ) + A Me } - /%)) 

-§(/** } + A/ii* ) ) T Cj 0 1 (^i t) + A/x£ 4) ) 

(B.3) 

where and G Z)t = Differentiating with respect to 

(Afj,z\ Anf^) leads to the following system of coupled linear equations: 


where: 


and: 




djj t] 

- 


0 

0 


-)• H t 


A /4° 

A M « 


h t 


H t 


T Q^e,t^o,t + C 9 q tqG^jGzj 
r Q Gl t G e ,t T Q G T zt G z j + CzJ 


(B.4) 


(B.5) 


TQG T et (u target - U(tXz\ H^)) - Cg^nf - flgo) 
T Q G T Z)t (u target - u(n [ z\n^)) - C-Qnf 


We note that at each iteration the forward solver needs to be called for 


the computation of the output vector u and its derivatives Gq }Z . Iterations 

are terminated when no further improvement is possible i.e. < 

I Me I ' 1/4 1 

(■ tolerance ) = 10~ 5 . 
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Appendix C. Regularization of /x z and update equation for Num. Illus¬ 
tration 2 

We discuss in this section the definition of the regularization/prior Pn*(t*z) 
for numerical illustration 2 (Section 13.21) and the resulting changes in the opti¬ 


mization scheme for pi, in |Appendix B[ Given the physical interpretation of the 
design variables p,, as binary variables which for each pixel indicate the presence 
or not of material, we adopt a regularization for p z that promotes the discovery 
of such solutions but also exhibits the requisite spatial correlation. To that end 
we propose a hierarchical prior where in addition to p, = {p z ,j}^=i we introduce 
the binary hyperparameters (j) = {(f)j = ±1} 3 £ 36 such that: 


3536 


PnAUzl^) = ftp^j 

i =i 


(C.l) 


where p{p z ,j\4>j = —1) = A/"(— m,s 2 ) and p{p z ,j\ ( t ) j = +1) = N{m,s 2 ). The 

value of m was selected so that in combination with the sigmoid function (Equa¬ 
tion (I52D ) produces solutions close to the binary images we would like to achieve: 


1 + e r ‘ 


= 10 


-3 


0, 


1 + e~ 


= 1-10 


-3 


(C-2) 


This yields m = —6.9 and the resulting, bimodal, hierarchical prior is depicted 
in Figure IC19il In order to account for the spatial dependence of neighboring 


p Z j we employ an auto-logistic hyperprior on </> of the following form 
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Mfl: 


p(cj)\/3) oc e 2 <t>j<t>k 


(C.3) 
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The second sum in the expression above is over all indices k which correspond 
to sites neighboring to j (neighborhood relation denoted by ~). Given the 
triangular mesh used, we consider 3 neighbors for each site j as shown in Figure 
1C . 19i il The hyperparameter /3 controls the strength of spatial correlation. At 
one extreme, if (3 —> +oo, neighboring <pj prefer to have different values (i.e. 
—1/ + 1 or +1/ —1) as this yields a higher hyperprior value. At the other extreme, 
if /? —* — oo, neighboring dj prefer to have the same values (i.e. —1/ — 1 or 
+1/ + 1). For (3 = 0 no correlation is present. We note that the aforementioned 
prior in (p imbues indirectly spatial correlation in fi z j. 

In summary, the prior ^(/fj can be found by integrating out the hyperpa¬ 
rameters (p and (3 as: 

PvSPz) = I ^(/cI^pOI/ 3 ) d HP (c.4) 


The integration above cannot be performed analytically and for that reason we 


employed an Expectation-Maximization scheme 
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whereby at the Expecta¬ 


tion step a Metropolized-Gibbs scheme is used to sample the hyperparameters <p 
and f3 from their conditional posterior (given the current value of fi z ). This does 
not require any forward calls and can be very efficiently performed. The samples 
generated can be used to estimate logp^ (p, z ) and its derivatives as needed for 


the update equations in Appendix B If we denote with < > expectations with 
regards to the posterior samples of <p described above and by keeping only terms 
that depend on /x. we obtain that: 


logP/J/c) = -2? < (/-Gj - m ^j ) 2 > 

= ~2 MmT/G - 2m vl <(/>> + < 0 1 (/> >) 


(C.5) 
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(i) Bimodal prior (n Z j\Ij) (ii) Neighborhood structure 

Figure C.19: Definition of *V,(/0 


The quadratic form of this expression implies that the only changes in the update 
Equation (IB.41) in|Appendix B| will be in: 


and: 


H t = 


T Q^e,tGe,t + C 9 q tqGqjG z j 

T QG^ t Ge !t r QGl }t G Z)t + jzl 


(C.6) 


h< = 


t qG-q t (u targ et ~ u{^\ fl^)) - - fl eo ) 




T Q G T z t {u target - u(iUz\ /x£ £) )) - ^ (/x£ J - m < 4> >) 


(t) 


(C.7) 


The aforementioned equations should be augmented by the equality con¬ 
straint in Equation (1531) . We enforce this constraint directly on fi z so as the 
optimal design (/x z ) satisfies it. From an algorithmic point, the process adopted 
is similar to Sequential Quadratic Programming (SQP, [61]) where the quadrati- 


cized objective (Equation (IB.31) ) at each iteration t is augmented by the linearized 
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constraint: 


0 = c(a*W + A/xf) « c(a*?) + fj Aa*W 


(C-8) 


where f t = ^\ z= ^. 


In order to account for the constraint in the rest of the auxiliary density p aU x, 
the scheme described in Equation (l54l) is adopted which induces a soft/probabilistic 

c 2 (g) 

enforcement. The term 2e = will therefore yield an additional contribution in 
the variational lower-bound T detailed in Equation (jl7l) . If we denote by T c this 
additional term, then: 



(C.9) 


-c 


= + Wy + r, z )\ 

c 


Given the nonlinear form of c(z), we employ another linearization around /x,: 

+ + « c(/xj + / r (TF?/ + T 7 j 


(C.10) 


/ 7 (Wy + Vz) ( since c (mJ = °) 


where f = ||| z=/x . As a result of this and the form of q (Equation (T2Th ). 
Equation (1C.91) becomes: 


T c =-^{W T f fw -.Cyy + T-'f f :(I-WW T )) (c.ll) 


which when combined with the rest of the terms in T in Equation (l33i) . leads to 
the following changes in the update equations in the VB-EM scheme: 
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• VB-Expectation: 


/ '~fOpt a^Opt 

° 6>6> '-'Oy 

-1 

T Q G T e Ge + CgQ 

r Q G T e G z W 

_sym. C% _ 


sym. 

t q W t G t z G z W + r y0 I + f T W 

C c _ 


C c 

(C.12) 


and: 

T °”‘ = T *> + + 4 / f T ) : ( f - (c.13) 


• VB-Maximization: 


W° r ‘ = argmax P w (W) 


(C.14) 


where: 


Fw{W) = 


-{I9-W t G t z G z W + ^ W T f f T W) : (C w - t~ 1 I) 
- T -f2G T e G z W : C ey + log p w (W) 

(C-15) 
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